Automatic Estimation of Word Significance oriented for Speech-based Information Retrieval

نویسندگان

  • Takashi Shichiri
  • Hiroaki Nanjo
  • Takehiko Yoshimi
چکیده

Automatic estimation of word significance oriented for speech-based Information Retrieval (IR) is addressed. Since the significance of words differs in IR, automatic speech recognition (ASR) performance has been evaluated based on weighted word error rate (WWER), which gives a weight on errors from the viewpoint of IR, instead of word error rate (WER), which treats all words uniformly. A decoding strategy that minimizes WWER based on a Minimum Bayes-Risk framework has been shown, and the reduction of errors on both ASR and IR has been reported. In this paper, we propose an automatic estimation method for word significance (weights) based on its influence on IR. Specifically, weights are estimated so that evaluation measures of ASR and IR are equivalent. We apply the proposed method to a speech-based information retrieval system, which is a typical IR system, and show that the method works well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum Bayes-Risk Decoding cons for Information Retrie

The paper addresses a new evaluation measure of automatic speech recognition (ASR) and a decoding strategy oriented for speech-based information retrieval (IR). Although word error rate (WER), which treats all words in a uniform manner, has been widely used as an evaluation measure of ASR, significance of words are different in speech understanding or IR. In this paper, we define a new ASR eval...

متن کامل

On the Use of Information Retrieval Measures for Speech Recognition Evaluation

This paper discusses the evaluation of automatic speech recognition (ASR) systems developed for practical applications, suggesting a set of criteria for application-oriented performance measures. The commonly used word error rate (WER), which poses ASR evaluation as a string editing process, is shown to have a number of limitations with respect to these criteria, motivating alternative or addit...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Grounded Language Modeling for Automatic Speech Recognition of Sports Video

Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpora of unlabeled video, and are applied to the task of automatic speech recognition of sports video. Results show that grounded language models improve perplexity and word error rate over text based language models, and...

متن کامل

The Effect of Word Meaning on Speech DysFluency in Adults with Developmental Stuttering

Objectives: Stuttering is one of the most prevalent speech and language disorders. Symptomology of stuttering has been surveyed from different aspects such as biological, developmental, environmental, emotional, learning and linguistic. Previous researches in English-speaking people have suggested that some linguistic features such as word meanings may play a role in the frequency of speech non...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008